231 research outputs found

    Oversampling for Imbalanced Learning Based on K-Means and SMOTE

    Full text link
    Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 71 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation is made available in the python programming language.Comment: 19 pages, 8 figure

    Oversampling for imbalanced learning based on k-means and smote

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsLearning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 71 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation is made available in the python programming language

    Characterizing digital microstructures by the Minkowski‐based quadratic normal tensor

    Get PDF
    For material modeling of microstructured media, an accurate characterization of the underlying microstructure is indispensable. Mathematically speaking, the overall goal of microstructure characterization is to find simple functionals which describe the geometric shape as well as the composition of the microstructures under consideration and enable distinguishing microstructures with distinct effective material behavior. For this purpose, we propose using Minkowski tensors, in general, and the quadratic normal tensor, in particular, and introduce a computational algorithm applicable to voxel-based microstructure representations. Rooted in the mathematical field of integral geometry, Minkowski tensors associate a tensor to rather general geometric shapes, which make them suitable for a wide range of microstructured material classes. Furthermore, they satisfy additivity and continuity properties, which makes them suitable and robust for large-scale applications. We present a modular algorithm for computing the quadratic normal tensor of digital microstructures. We demonstrate multigrid convergence for selected numerical examples and apply our approach to a variety of microstructures. Strikingly, the presented algorithm remains unaffected by inaccurate computation of the interface area. The quadratic normal tensor may be used for engineering purposes, such as mean field homogenization or as target value for generating synthetic microstructures

    Simulation of aperture-optimised refractive lenses for hard X-ray full field microscopy

    Get PDF
    The aperture of refractive X-ray lenses is limited by absorption and geometry. We introduce a specific simulation method to develop an aperture-optimized lens design for hard X-ray full field microscopy. The aperture-optimized lens, referred to as Taille-lens, allows for high spatial resolution as well as homogeneous image quality. This is achieved by the individual adaptation of the apertures of hundreds of lens elements of an X-ray imaging lens to the respective microscopy setup. For full field microscopy, the simulations result in lenses with both a large entrance and exit aperture and lens elements with smaller apertures in the middle of the lens

    Determination of the packing fraction in photonic glass using synchrotron radiation nanotomography

    Get PDF
    Photonic glass is a material class that can be used as photonic broadband reflectors, for example in the infrared regime as thermal barrier coating films. Photonic properties such as the reflectivity depend on the ordering and material packing fraction over the complete film thickness of up to 100 μm. Nanotomography allows acquiring these key parameters throughout the sample volume at the required resolution in a non-destructive way. By performing a nanotomography measurement at the PETRA III beamline P05 on a photonic glass film, the packing fraction throughout the complete sample thickness was analyzed. The results showed a packing fraction significantly smaller than the expected random close packing giving important information for improving the fabrication and processing methods of photonic glass material in the future

    Indicators for the on-farm assessment of crop cultivar and livestock breed diversity: a survey-based participatory approach

    Get PDF
    International audienceAgrobiodiversity plays a fundamental role in guaranteeing food security. However, still little is known about the diversity within crop and livestock species: the genetic diversity. In this paper we present a set of indicators of crop accession and breed diversity for different farm types at farm-level, which may potentially supply a useful tool to assess and monitor farming system agrobiodiversity in a feasible and relatively affordable way. A generic questionnaire was developed to capture the information on crops and livestock in 12 European case study regions and in Uganda by 203 on-farm interviews. Through a participatory approach, which involved a number of stakeholders, eight potential indicators were selected and tested. Five of them are recommended as potentially useful indicators for agrobiodiversity monitoring per farm: (1) crop-species richness (up to 16 crop species), (2) crop-cultivar diversity (up to 15 crop cultivars, 1-2 on average), (3) type of crop accessions (landraces accounted for 3 % of all crop cultivars in Europe, 31 % in Uganda), (4) livestock-species diversity (up to 5 livestock species), and (5) breed diversity (up to five cattle and eight sheep breeds, on average 1-2).We demonstrated that the selected indicators are able to detect differences between farms, regions and dominant farm types. Given the present rate of agrobiodiversity loss and the dramatic effects that this may have on food production and food security, extensive monitoring is urgent. A consistent survey of crop cultivars and livestock breeds on-farm will detect losses and help to improve strategies for the management and conservation of on-farm genetic resources

    Indikatoren zur Erfassung genetischer Vielfalt in biologischen und nicht-biologischen Landwirtschaftssystemen

    Get PDF
    Genetic variability is the fundament of life. Large genetic variability within species is the basis for adaptation to changing environmental conditions. Farmers and breeders have developed a multitude of crop cultivars and animal breeds to stabilize and increase quality and productivity. This study evaluated genetic diversity within different organic and non-organic farming systems using crop-cultivar and livestock-breed information as simple indicators. Data was collected using on-farm surveys in 15 case study regions in Europe and beyond. Selected indicators revealed strong differences of cultivar diversity between different countries and farming systems across Europe. No or only small differences were detectable between organic and non-organic farming systems. Landraces, as on-farm genetic resources, were under-represented in European case study regions

    BIOBIO – Indikatoren für Biodiversität in ökologischen und ex-tensiven Anbausystemen

    Get PDF
    Organic and low-input farming systems provide habitats for wildlife on farmland. The EU FP7 project BIOBIO has identified a core set of 23 indicators relating to the diversity of habitats, of species, of crops and of livestock. Management indicators capturing the pressure on biodiversity are also proposed. The indicators were identified in an iterative process between scientists and stake-holders to make sure that they are not only scientifically sound but also practicable and attractive. They were tested in 12 case study regions on four major farm types. Allocating 0.25 % of the CAP budget to a farm scale biodiversity monitoring would allow to measure and analyse the indicators on 50,000 farms across Europe
    corecore